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Abstract 

Let y be a nonnegative random variable with mean and finite positive variance , and let V , 
defined on the same space as Y, have the Y size biased distribution, that is, the distribution characterized 
by 

E[Yf{Y)] = jj,Ef{Y") for all functions / for which these expectations exist. 

Under a variety of conditions on the coupling of Y and Y" , including combinations of boundedness and 
monotonicity, concentration of measure inequalities such as 

P ( ^— ^ >t^ < exp ( — ^ for aU t > 

hold for some explicit A and B. Examples include the number of relatively ordered subsequences of a 
random permutation, sliding window statistics including the number of m-runs in a sequence of coin 
tosses, the number of local maximum of a random function on a lattice, the number of urns containing 
exactly one ball in an urn allocation model, the volume covered by the union of n balls placed uniformly 
over a volume n subset of R'', the number of bulbs switched on at the terminal time in the so called 
lightbulb process, the number of isolated vertices in the Erdos-Renyi random graph model, and the in- 
finitely divisible and compound Poisson distributions that satisfy a bounded moment generating function 
condition. 

1 Introduction 

Size biasing of random variables is essentially sampling them proportional to their size. Of the many contexts 
in which size biasing appears, perhaps the most well known is the waiting time paradox, so clearly described 
in Feller [T^, Section 1.4. Here, a paradox is generated by the fact that in choosing a time interval 'at 
random' in which to wait for, say buses, it is more likely that an interval with a longer interarrival time 
is selected. In statistical contexts it has long been known that size biasing may affect a random sample in 
adverse ways, though at times this same phenomena may also be used to correct for certain biases |21| . 

In the realm of normal approximation, size biasing finds a place in Stein's method (see, for instance, 
[5T] and [2]) alongside the exchangeable pair technique. The areas of application of these two techniques 
are somewhat complementary, with size biasing useful for the approximation of distributions of nonnegative 
random variables such as counts, and the exchangeable pair for mean zero variates. Though Stein's method 
has been used mostly for assessing the accuracy of normal approximation, recently related ideas have been 
proved to be successful in deriving concentration of measure inequalities, that is, deviation inequalities of 
the form P{\Y - E{Y)\ > i^Var(r)), where typically one seeks bounds that decay exponentially in t; for 
a guide to the literature on the concentration of measures, see [20] for a detailed overview. Regarding the 
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use of techniques related to Stein's method to prove such inequahties, Raic obtained large deviation bounds 
for certain graph related statistics in [28^ using the Cramer transform and Chatterjee [7] derived Gaussian 
and Poisson type tail bounds for Hoeffding's combinatorial CLT and the net magnetization in the Curie- 
Weiss model in statistical physics in [J . While the first paper employs the Stein equation, the later applies 
constructions which are related to the exchangeable pair in Stein's method (see [32]). 

For a given nonnegative random variable Y with finite nonzero mean /z, recall (see |15] , for example) that 

has the F-size biased distribution if 

i?[F/(F)] = yui?[/(y*)] for all functions / for which these expectations exist. (1) 

Motivated by the complementary connections that exist between the exchangeable pair method and size 
biasing in Stein's method, we prove the following theorem that shows the parallel persists in the area 
of concentration of measures, and that size biasing can be used to derive one sided deviation results for 
nonnegative variables Y that can be closely coupled to a variable Y^ with the Y size biased distribution. 
Our first result requires the coupling to be bounded. 

Theorem 1.1. Let Y he a nonnegative random variable with mean and variance ^ and respectively, both 
finite and positive. Suppose there exists a coupling ofY to a variable Y'^ having the Y-size bias distribution 
which satisfies \Y'^ — Y\<C for some C > with probability one. 

IfY'^>Y with probability one, then 

P (^—^ < -t^ < cxp (-^^ for all t > 0, where A = C^i/a^. (2) 

// the moment generating function m{9) = E{e^^) is finite at 6 — 2/C , then 

^ >t^ < exp (^~ 2(A + Bt) ) fo'>'allt>0,whereA:^Cfi/a^andB^C/2a. (3) 

The monotonicity hypothesis for inequality ([2]), that y > Y, is natural since Y" is stochastically larger 
than Y. Therefore there always exists a coupling for which Y'^ > Y. There is no guarantee, however, that for 
such a monotone coupling, the difference Y^ — Y is bounded. For ([3]) we note that the moment generating 
function is finite everywhere when Y is bounded. In typical examples the variable Y is indexed by n, and 
the ones we consider have the property that the ratio /i/cr^ remains bounded as n — > oo, and C does not 
depend on n. In such cases the bound in ^ decreases at rate exp(— ci^) for some c > 0, and if cr —> oo as 
n — > oo, the bound in ([3]) is of similar order, asymptotically. 

Examples covered by Theorem 11.11 are given in Section |31 and include the number of relatively ordered 
subsequences of a random permutation, sliding window statistics including the number of m-runs in a 
sequence of coin tosses, the number of local maximum of a random function on the lattice, the number of 
urns containing exactly one ball in the uniform urn allocation model, the volume covered by the union of n 
balls placed uniformly over a volume n subset of M.'^, and the number of bulbs switched on at the terminal 
time in the so called lightbulb problem. 

In SectionOwe also consider cases where the coupling of F** and Y is unbounded, handled on a somewhat 
case by case basis. Our examples include the number of isolated vertices in the Erdos-Renyi random graph 
model, and some infinitely divisible and compound Poisson distributions. As Theorem 11.11 shows . additional 
information is available when the coupling is monotone; this condition holds for the m runs, lightbulb and 
isolated vertices examples, as well as the infinitely divisible and compound Poisson distributions considered. 

A number of results in Stein's method for normal approximation rest on the fact that if a variable Y 
of interest can be closely coupled to some related variable, then the distribution of Y is close to normal. 
An advantage, therefore, of the Stein method is that dependence can be handled in a direct manner, by 
the construction of couplings on the given collection of random variables related to Y. In [S^ and [7], 
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ideas related to Stein's method were used to obtain concentration of measure inequalities in the presence of 
dependence. 

Of the two, the technique used by Chatterjee in [7], based on Stein's exchangeable pair [32], is the one 
closer to the approach taken here. We say Y, Y' is a A-Stein pair if these variables are exchangeable and 
satisfy the linearity condition 

E{Y-Y'\Y) = \Y for some A e (0,1). (4) 

The A-Stein pair is clearly the special case of the more general identity 

E{F{Y,Y')\Y) — ,f{Y) for some antisymmetric function F, 

specialized to F{Y,Y') = Y — Y' and f{y) = Xy. Chatterjee in [7] considers a pair of variables satisfying 
this more general identity, and, with 

AiY)^^EiifiY)-f{Y'))FiY,Y')\Y), 

obtains a concentration of measure inequality for Y under the assumption that A{Y) < Bf{Y) + C for some 
constants B and C. 

For normal approximation, as seems to be the case here also, the areas in which pair couplings such as 
(III) apply, and those for which size bias coupling of Theorem 11.11 succeed, appear to be somewhat disjoint. In 
particular, (jj]) seems to be more suited to variables which arise with mean zero, while the size bias couplings 
work well for variables, such as counts, which are necessarily nonnegative. Indeed, for the problems we 
consider, there appears to be no natural way by which to find exchangeable pairs satisfying the conditions 
of [7]. On the other hand, the size bias couplings applied here are easy to obtain. 

After proving Theorem II. II in Section [21 in Section [3| we review the methods in [15| for the construction 
of size bias couplings in the presence of dependence, and then move to the examples already mentioned. 



2 Proof of the main result 

In the sequel we make use of the following inequality, which depends on the convexity of the exponential 
function; 

= / e*2'+(i-*)"di < / {tey + [1 - t)e^)dt = —— for all x ^ (5) 

y-x Jo Jo 2 

We now move to the proof of Theorem 11.11 



Proof. Recall Y'' is given on the same space as Y, and has the Y size biased distribution. By ([5]), for all 
6* e M, since -Y\<C, 

|e^^° - e'^l < 1\0{Y^ - F)|(e^^' + e"^) < ^(e"^' + e"^). (6) 

Recalling that if the moment generating function m{6) = £'[e^^] exists in an open interval containing 6 then 
we may differentiate under the expectation, we obtain 

m\0) = EiYe*^^] = ^lE[e'^^']. (7) 

To prove ([2|), let 6* < and note that since the coupling is monotone exp{6Y'^) < cxp{9Y). Now ([6]) 
yields 
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Since Y > the moment generating function m{0) exists for all 6' < 0, so taking expectation and rearranging 
yields 

Ee^^' > (1 - C\9\)Ee'^^ = (1 + C9)E{e'^^), 

and now, by ([7]), 

m'{e) > ^(1 + Ce)m{e) for all 6 <0. (8) 
To consider standardized deviations of Y, that is, deviations of \Y — /i|/(T, let 

M{9) = Ee^^^-f"^/" = e-"''/ "771(0/ a). (9) 
Now rewriting ([8]) in terms of M{9), we obtain for all 9 < 0, 

M'{9) = -{^i/a)e~^''^''777 {9/a) + e-'^^'/''77l'{9/cr)/a 

> -{^i/a)e-''^^"77i{9/a) + (/i/(T)e-^^/" (^1 + 77i{9/a) 

= {pi/a'^)C9M{9). (10) 

Since M{0) ^ 1. bv (fTOll 

-logM(0)= / ^^-i^ds > / -jT^s- 
so exponentiation gives us 



M{s) -Jo cr2 2cr2 



M{9) < exp ) when 9 < 0. 



Hence for a fixed t > 0, for all < 0, 



P 



^<-t]=p(9(^]>-9t] = pfe«(^)>e-«* 



< e^*M(0) < exp ( 0t + ^ ) . (11) 



Substituting 9 = —ta'^/{Cij) into (fTTj) completes the proof of 

Moving on to the proof of ([3]), taking expectation in ^ with > 0, we obtain 



so in particular, when < 9 < 2/C, 

E|e«-l<(i±|^)E|e-l. (12) 



As m(2/C) < oo, (IZl) apphes and ^ yields 



m'(0) < M ( [i^) m(0) for all < < 2/C. (13) 
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Now letting 9 € (0, 2ct/C), from M{0) is differentiable for all 9 < 2a/C and ^ yields, 
M'{0) = -{iila)e-^^l''m{dlG)^e-'^^l''m!{dlG)la 



Dividing by M{6) we may rewrite the inequality as 

'^■logM{9)<{^^/a')^ 



Noting that Af(0) — 1, setting yl = C/x/cr^ and _B = C/(2cr), integrating we obtain 

logM(0) = J^'±logM{s)ds<i^^/a')J'^J%^)ds^i^,/a') 
Hence, for i > 0, 



BO J '2{l-B6) 2(1 



Noting that 9 = t/{A + Bt) hcs in (0, 2ct/C) for all t > 0, substituting this value yields the bound 

P > < exp (-^^^) foralH>0, 

completing the proof. □ □ 

3 Construction of size bias couplings 

In this section we will review the discussion in [15j which gives a procedure for a construction of size bias 
couplings when 1^ is a sum; the method has its roots in the work of Baldi et al. [1] . The construction depends 
on being able to size bias a collection of nonnegative random variables in a given coordinate, as described 
in the following definition. Letting F be the distribution of Y , first note that the characterization ([1]) of the 
size bias distribution F'^ is equivalent to the specification of F^ by its Radon Nikodym derivative 

dF'{x) = -dF{x). (14) 
M 

Definition 3.1. Let A he an arbitrary index set and let {Xa ■ a G A} be a collection of nonnegative random 
variables with finite, nonzero expectations EXa — fi^ and joint distribution dF{x). For [3 G A, we say that 
"KP — {X^ : a G A} has the X size bias distribution in coordinate (3 if X*^ has joint distribution 

dF^{^)=XfidF{^)/lip. 

Just as is related to ([1]), the random vector X'^ has the X size bias distribution in coordinate (3 if 
and only if 

E[Xi3f{~K)] — ^fjE[f{~K!^)] for all functions / for which these expectations exist. 
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Now letting /(X) = 3(^/3) for some function g one recovers ([T]), showing that the coordinate of X'', that 
is, Xp, has the Xp size bias distribution. 
The factorization 

P(X e dx) = P(X e dx|X^ = e dx) 

of the joint distribution of X suggests a way to construct X. First generate Xp, a variable with distribution 
P{Xfj € dx). If = then generate the remaining variates {X^^a 7^ /?} with distribution P(X S 
(ix|X^ — x). Now, by the factorization of dF{x.), we have 

dpf^ix.) = xpdF{x)/n[j = P(X e (ix|X;3 = x)a;^P(X^ G dx)/nf3 = P(X e dx|X^ = .t)F(x| e dx). (15) 

Hence, to generate X'' with distribution dF^ , first generate a variable with the size bias distribution, 
then, when X^ = x, generate the remaining variables according to their original conditional distribution 
given that the /3*'' coordinate takes on the value x. 

Definition 13.11 and the following proposition from Section 2 of 15J will be applied in the subsequent 
constructions; the reader is referred there for the simple proof. 

Proposition 3.1. Let A be an arbitrary index set, and let X = {Xa, a G .4} be a collection 0/ nonnegative 
random variables with finite means. For any subset B C A, set 

Xb = Xf^ and ^j,b — EXb- 

Suppose B C A with < /is < 00, and for (3 G B let X^ have the 'X.-size biased distribution in coordinate /3 
as in Definition \3.1[ IfX^ has the mixture distribution 

then 

EXBfiX) - MBi?/(X^) 

for all real valued functions f for which these expectations exist. Hence, for any A <Z A, if f is a function 
of Xa = J2aeA^a only, 

EXBfiXA) = tiBEfiX^) where ^ ■ (^6) 

Taking A — B in filOjl we have EXAf{XA) — iiAEf{X^), and hence X^ has the XA-size biased distribution, 
as in (Qp. 

In our examples we use Proposition 13.11 and psp to obtain a variable with the size bias distribution 
of Y, where Y = '^aeA-^^, as follows. First choose a random index I E A with probability 

P{I = a) = fia/^iA, a e A. 

Next generate Xj with the size bias distribution of Xj. If / = a and X^ = x, generating {X^ : j3 G ^\ {a}} 
using the (original) conditional distribution 

P{X0,l3^a\Xo^^x), 

the sum = V ^ . Xi has the Y size biased distribution. 
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4 First applications: bounded couplings 

We now consider the apphcation of Theorem 11.11 to derive concentration of measure results for the number 
of relatively ordered subsequences of a random permutation, the number of m-runs in a sequence of coin 
tosses, the number of local extrema on a graph, the number of nonisolated balls in an urn allocation model, 
the covered volume in binomial coverage process, and the number of bulbs lit at the terminal time in the so 
called light bulb process. Without further mention we will use the fact that when ^ and ^ hold for some 
A and B then they also hold when these values are replaced by any larger ones, which may also be denoted 
by A and B. 

4.1 Relatively ordered sub-sequences of a random permutation 

For 71 > TO > 3, let TT and r be permutations of V = {1, . . . , n} and {!,..., to}, respectively, and let 

Va = {a, a + 1, . . . , a + TO — 1} for a e V, 

where addition of elements of V is modulo n. We say the pattern t appears at location a G V if the values 
{7r(w)}j,gVc, S'lid {'''{''j)}veVi are in the same relative order. Equivalently, the pattern r appears at a if and 
only if Tr{T~^{v) + a — l),v G Vi is an increasing sequence. When t = the identity permutation of length 
TO, we say that tt has a rising sequence of length to at position a. Rising sequences are studied in [6] in 
connection with card tricks and card shuffling. 

Letting tt be chosen uniformly from all permutations of {1, . . . , n}, and the indicator that t appears 
at a, 

Xa,{Tr{v),v e Va) = l(7r(T-i(l) + a - 1) < • • • < 7r(r-i(m) + a~ 1)), 

the sum Y = X^qgv -^"^ counts the number of TO-clcmcnt-long segments of tt that have the same relative 
order as r. 

For a G V we may generate X" = {X^,/3 e V} with the X = {Xp,P £ V} distribution size biased in 
direction a, following |13| . Let (Ta be the permutation of {!,..., m} for which 

TT{(Ta{l) + a — 1) < • • • < TT{(Ta{m) + a — 1), 

and set 

In other words tt" is the permutation tt with the values tt{v), v G Vq. reordered so that 7r"(7) for 7 G Vq are 
in the same relative order as r. Now let 

X^ ^Xf,{TT''{v),VeVp), 

the indicator that r appears at position P in the reordered permutation tt". As tt" and tt agree except 
perhaps for the to values in Vq, we have 

X;^ = Xf3{TT{v),v £ Vfj) for aU \(3 - a\ > to. 

Hence, as 

|r"-y|< ^ \X^-Xf>\<2m-l. (17) 

|/3-Q|<m-l 

we may take C — 2m — 1 as the almost sure bound on the coupling of Y^ and Y. 

Regarding the mean /i of Y, clearly for any r, as all relative orders of tt{v), v £ Va are equally likely, 

EXa = 1/to! and therefore fi = n/ml. (18) 
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To compute the variance, for < /c < to — 1, let be the indicator that t(1), . . . , r(TO — k) and T(fc + 
1), . . . ,T{m) are in the same relative order. Clearly Iq — 1, and for rising sequences, as t(j) = j, Ik = 1 for 
all k. In general for < fc < to — 1 we have XaXa+k = if /fe = 0, as the joint event in this case demands 
two different relative orders on the segment of tt of length m — fc of which both Xa and Xa+k are a function. 
If Ik = 1 then a given, common, relative order is demanded for this same length of tt, and relative orders 
also for the two segments of length k on which exactly one of Xa and Xp depend, and so, in total a relative 
order on m — fc + 2fc = to + fc values of tt, and therefore 

EXaXa+k = Ik/{m + k)l and Cov(X„, X^^+k) = Ik/{m + fc)! - l/{mlf. 

As the relative orders of non-overlapping segments of tt are independent, now taking n > 2m, the variance 
of Y is given by 

= ^Var(X„) + ^Cov(X„,X;3) 

= ^Var(X„) + ^ Cov{Xa,Xp) 

a^V aeV (3:l<\a-p\<7n-l 

m — 1 

= Var(X„) + 2 ^ ^ Cov{Xa,Xa+k) 

m—1 

= nVar(Xi) + 2n ^ Cov(Xi, Xi+fe) 
fc=i 



1 \ n 





Clearly Var(y) is maximized for the identity permutation T(fc) = fc, fc = 1, . . . , to, as = 1 for all 1 < to < 
TO — 1, and as mentioned, this case corresponds to counting the number of rising sequences. In contrast, the 
variance lower bound 

o / 2to — 1 \ , , 

^'>—A^ r- (19) 

to! \ m\ J 

is attained at the permutation 

1 J = l 

T(i) - <( J + 1 2 < J < TO - 1 

2 j — m 

which has = for all 1 < fc < to — 1. In particular, the bound ([3]) of Theorem 11.11 holds with 

2to — 1 2in — 1 

A = ^— — r and B = 



1 2m — 1 / TIN 

i rri— o / /^l _ 2m-l ^ 



4.2 Local Dependence 

The following lemma shows how to construct a collection of variables X" having the X distribution biased 
in direction a when X^ is some function of a subset of a collection of independent random variables. 
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Lemma 4.1. Let {Cg,g G V} &e a collection of independent random variables, and for each a G V let 
Vq C V and Xa = Xa{Cg,g G V^) be a nonnegative random variable with a nonzero, finite expectation. 
Then if {Cg,g G Vq,} has distribution 

rjJia(Cg,g e Va) 

and is independent of {Cg,g G V}, letting 

= ^^(Cg", g e V;3 n Vq, Cg, g e n V^), 

the collection X" — {X^,/3 G V} has the X distribution biased in direction a. 

Furthermore, with I chosen proportional to EX a, independent of the remaining variables, the sum 

has the Y size biased distribution, and when there exists M such that Xa < M for all a, 

\Y'-Y\<bM where 6 = max |{/3 : Va H Vq 7^ 0}|. (20) 

a 

Proof. By independence, the random variables 

{Cg",5G VQ}U{Cg,5^ Vq} have distribution dF'^icg, g £ Va)dF{cg, g ^ V^). 
Thus, with X" as given, we find 

EXaf(X.) = j Xaf{^)dF{cg,g£V) 

T?Y f fl ^ XadF{Cg,ge Vq) 

= ^^"i^(")i?XQ(C„5GVQ)'^^(^-^^^") 
= EXaJ f{^)dF"{cg,ge Va)dF{cg,g^ Vq) 
= EXaEfiX.^). 

That is, X" has the X distribution biased in direction a, as in Definition 13.11 

The claim on Y^ foUows from Proposition 13. 11 and finally, since Xp = Xp whenever V/3 n Vq = 0, 

\Y^ -Y\< 1^/5 - ^ b^^- 

This completes the proof. □ □ 

4.2.1 Sliding m window statistics 

For n > TO > 1, let V = {1, ... ,71} considered modulo n, {Cg : 5 G V} i.i.d. real valued random variables, 
and for each a G V set 

Vq = {w G V : a < u < a + TO, - 1}. 

Then for X : R™ [0, 1], say, Lemma [4.11 mav be applied to the sum Y = X^qgv '^^ ^^"^ TO-dependcnt 
sequence X^ — Xi^C^: ■ ■ ■ 1 ^Q+m— 1 

), formed by applying the function X to the variables in the 'TO-window' 
Vq. As for all a we have Xa < 1 and 

max |{/3 : V^ n Vq 7^ 0}| = 2m - 1, 
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we may take C — 2m — 1 in Theorem 11.11 by Lemma 14.11 

For a concrete example let Y be the number of m runs of the sequence Ci; "^2, • • ■ , of rt i.i.d BernouUi(p) 
random variables with p € (0, 1), given by y = £"=1 where Xi = Cid+i ■ ' ■^i+m-i, with the periodic 
convention ^n+k — S,k- In [30j . the authors develop smooth function bounds for normal approximation for 
the case of 2-runs. Note that the construction given in Lemma |4. II for this case is monotone, as for any i, 
letting 

H ^ \ d i ^ {i, . . . ,i + - 1} 
\ 1 3 e{i,...,i + m-l}, 

the number of m runs of that is — Y^^=i CC+i ' ' ' ^i+m-i^ least Y. 

For the mean of Y clearly /i = np™. For the variance, now letting n > 2m and using the fact that 
non-overlapping segments of the sequence are independent, 

n 

(T' = ^ Var(^i^i+i • • -^i+m-i) + 2^Cov(^j • • • • -^j+m-i) 

n m—1 

1=1 j=i 

For the covariances, 

Cov(fi • • • fi+m-l, Ci+j" • • • d+j+m-l) = E{^i ■ ■ ■ ' ' ' £,i+m-l£,i+m ' ' ' (.i+j+m-l) ~ 



and therefore 

= np™ (^{1 - p"") + 2 (^^j^^ - (m - 1)]5"^ ^ = Tip™ (^1 + 2 ^"^"^ - (2m - l)p'^ 

Hence ^ and of Theorem O hold with 

2m - 1 , „ 2m - 1 

A = = and B = 



l + 2^-^-i2m-l)p- 2ynp"'(l + 2^-(2m-l)p^ 



4.2.2 Local extrema on a lattice 

Size biasing the number of local extrema on graphs, for the purpose of normal approximation, was studied in 
[I] and [13]. For a given graph G = {V, £}, let Qy — {Vy,£y},v e V, be a collection of isomorphic subgraphs 
of G such that v GVy and for all vi,V2 G V the isomorphism from Qy-^ to Qy^ maps vi to V2- Let {Cg, g € V} 
be a collection of independent and identically distributed random variables, and let Xy be defined by 

Xy{Cy,,W eVy) ^l{Cy > Cui,W eVy), VEV. 

Then the sum Y = X^^gv counts the number local maxima. In general one may define the neighbor 
distance d between two vertices v,w € V hy 

d{v, w) = min{n : there 3 vq, . . . ,Vn vo-V such that v^) = v,Vn = w and {vu, f/c+i) G £ for fc = 0, . . . , n\. 

Then for w e V and r = 0, 1, . . ., 



Vy{r) = {w e V : d{w,v) < r} 
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is the set of vertices of V at distance at most r from v. We suppose that the given isomorphic graphs are 
of this form, that is, that there is some r such that V„ = Vi,(r) for all v £V. Then if d{vi,V2) > 2r, and 
(wi,u'2) S Vvi X Vv2, rearranging 

2r < d{vi,V2) < d{vi,wi) +d{wi,'W2) +d{w2,V2) 
and using d{vi, Wi) < r,i — 1,2, yields d{wi,W2) > 0. Hence, 

d{vi,V2) > 2r implies Vv^ f| Vt,2 — 0, so by ((20)) we may take b — max |Vt,(2r)|. (21) 

For example, for p e {1,2, . . .} and n > 5 consider the lattice V — {1, . . . ,n}P modulo n in and 
£ — {{v,w} : d{v,w) — 1}; in this case d is the norm 

p 

d{v,w) = ^ \vi - Wi\. 

i=l 

Considering the case where we call vertex v a local extreme value if the value C„ exceeds the values over 
the immediate neighbors w of we take 

V„ = V„(l) and that |V,„(1)| = 1 + 2p, 

the 1 accounting for v itself, and then 2p for the number of neighbors at distance 1 from v, which differ from 
V by either +1 or —1 in exactly one coordinate. 
Lemma l4?n (|2T]) . and < 1 yield 

-Y\< max |V„(2)| = 1 + 2p + (^2p + = 2p^ + 2p + 1, (22) 

where the 1 counts v itself, the 2p again are the neighbors at distance 1, and the term in the parenthesis 
accounting for the neighbors at distance 2, 2p of them differing in exactly one coordinate by +2 or —2, and 
4(2) of them differing by either +1 or —1 in exactly two coordinates. Note that we have used the assumption 
n > 5 here, and continue to do so below. 

Now letting have a continuous distribution, without loss of generality we can assume ^ ^[0, 1]. As 
any vertex has chance l/|Vi,| of having the largest value in its neighborhood, for the mean ^ oiY we have 

y.^—^. (23) 

To begin the calculation of the variance, note that when v and w are neighbors they cannot both be 
maxima, so X^X^ = and therefore, for d{v, w) — 1, 

Cov{X,,X^)^-{EX,f ^ 



(2p+l)2- 



If the distance between v and w is 3 or more, Xy and X^, are functions of disjoint sets of independent 
variables, and hence are independent. 

When d{w^ v) — 2 there are two cases, as v and w may have either 1 or 2 neighbors in common, and 

EXyX^^ = 

P{U >Uj,V >Vj,j ^l,...,'m-k and U > Uj ,V > Uj , j ^ m ~ k + 1, . . . ,m), 

where m is the number of vertices over which v and w are extreme, so m = 2p, and fc = 1 and fc = 2 for the 
number of neighbors in common. For fc = 1, 2, . . ., letting Mk = max-lUm-k+i, • ■ ■ , C/m}, as the variables Xy 
and X^ are conditionally independent given Um-k+i, ■ • ■ j C^m 

E{XyX^\U^^u+i, . . . ,U^) = > C/j,j = l,...,m|;7™_fc+i,...,C/™)2 
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as 



P{U > Uj,j ^ 1, . . . ,m\U„i-k+i, ■ ■ ■ ,U,n) = / / ■■■/ dui ■ ■ ■ dum-kdu 

J Ml. Jo Jo 



f" 

1 



m — k + I 



(1 -M^"-'=+i). 



Since P{Mk < x) — x'^ on [0, 1], we have 



fm—k+l J / — — 1 



k 







m + 1 



1 k 







2rn - fc + 2 ■ 

Hence, averaging ((24)) over f7,n-fc+i, • ■ • , C/m yields 

2 

^^"^"^ " (m + l)(2(m + l)-fc)- 
For n > 3, when m = 2p, for fc = 1 and 2 we obtain 

Cov(X„,X.)=^2p+i)2(2(2p+l)-l) Cov(X.,X.)=^2p+l)2(2(2p+l)-2)' ^^^P^^^ively. 

For n > 5, of the 2p + 4(2) vertices w; that are at distance 2 from u, 2p of them share 1 neighbor in common 
with V, while the remaining 4(2) of them share 2 neighbors. Hence, 



= ^Var(X,) + ^Cov(X„X^) 

^Var(X,)+ Cov(X„X^)+ ^ Cov(X„,X^) 



dSV ci(t),'!i)) = l ci(-u, iii)=2 

(2p+l)2 " ^(2p+l)2 + ^(2p+l)2(2(2p+l)-l) + (^2^ (2p+l)2(2(2p+l)-2) 

2p / 1 2(p-l) 

(2p + 1)2 (2(2p + 1) - 1) (2(2p + 1) - 2) 
4p2 — p — I 
(2p+l)2(4p+l) 



(25) 



We conclude that ([2]) of Theorem 1 1 . 1 1 holds with A = C\xja'^ and B ~ C/2a with /x, 0-2 and C given by 
[23| . ([251) and respectively, that is, 

(2p+l)(4p+l)(2p2 + 2p+l) 2p2 + 2p + 1 



A = ^ — and B 

Ap^ — p — 1 



2Jn 



4p^— p— 1 
(2p+l)2(4p+l) 



4.3 Urn allocation 

In the classical urn allocation model n balls are thrown independently into one of m urns, where, for 
i = 1, . . . , m, the probability a ball lands in the i*^ urn is pi, with X]"=i Pi = 1- ^ much studied quantity of 
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interest is the number of nonempty urns, for which Kolmogorov distance bounds to the normal were obtained 
in [11] and [27]. In [11], bounds were obtained for the uniform case where pi = 1/m for all i = 1, . . . , to, 
while the bounds in ^\ hold for the nonuniform case as well. In [22] the author considers the normal 
approximation for the number of isolated balls, that is, the number of urns containing exactly one ball, and 
obtains Kolmogorov distance bounds to the normal. Using the coupling provided in [25j , we derive right 
tail inequalities for the number of non-isolated balls, or, equivalently, left tail inequalities for the number of 
isolated balls. 

For i = 1, . . . , n let denote the location of ball i, that is, the number of the urn into which ball i lands. 
The number Y of non-isolated balls is given by 

n n 

F = ^1(M, >0) where M, = -1 + ^l{Xj = X,). 

i=l 3=1 

We first consider the uniform case. A construction in |5S] produces a coupling of y to y , having the 

Y size biased distribution, which satisfies — ^^1 < 2. Given a realization of X = {^i, ■ ■ ■ , Xn}, the 
coupling proceeds by first selecting a ball /, uniformly from {1, 2, . . . , n}, and independently of X. Depending 
on the outcome of a Bernoulli variable B, whose distribution depends on the number of balls found in the 
urn containing /, a different ball J will be imported into the urn that contains ball /. In some additional 
detail, let B he a Bernoulli variable with success probability P{B = 1) = ttmi, where 

r ^^^>'=l^>°)-f<^>f) if0<fc<n-2 
\ if fc = n-l, 

with N Bin(l/m, n — 1). Now let J be uniformly chosen from {1, 2, . . . , n} \ {/}, independent of all other 
variables. Lastly, if S = 1, move ball J into the same urn as /. It is clear that |F' — < 2, as at most the 
occupancy of two urns can affected by the movement of a single ball. We also note that if Mj = 0, which 
happens when ball / is isolated, ttq = 1, so that / becomes no longer isolated after relocating ball J. We 
refer the reader to [5S] for a full proof that this procedure produces a coupling of F to a variable with the 

Y size biased distribution. 

For the uniform case, the following explicit formulas for /i and can be found in Theorem II. 1.1 of [18], 

TO \ m J 

\ m J TO \ m J \ m J 

Hence with and cr^ as in ([^S)) . we can apply Q of Theorem 11.11 for y, the number of non isolated balls 
with C = 2,A = 2/Vcr^ and B^l/a. 

Taking limits in (|26p . if to and n both go to infinity in such a way that n/m— !-Q;e(0,oo), the mean /i 
and variance cr^ obey 

^ X n{\ - e^") and cr^ x ng{af where = e^" - e^^"(Q!^ - a + 1) > for all a £ (0, oo), 

where for positive functions / and h depending on n we write f ^ h when lim„_+oo f /h ~ 1. 
Hence, in this limiting case A and B satisfy 

2(1 -e-") ^ ^ 1 

^ ^ ^ ' and B ^ 



e ^"(a^ — a-|-l) ^Jngia) 
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In the nonuniform case similar results hold with some additional conditions. Letting 

||p||= sup Pi and 7 = 7(71) = max(n||p||, 1), 

l<i<m 

in [2S] it is shown that when | |p| | < 1/11 and n > 887^(1 + 87 + 87^)6^ °^^, there exists a coupling such that 

IF" - rl < 3 and Ar < 81657V-1''. 
Now also using Theorem 2.4 in 25J for a bound on cr^, we find that ([3]) of Theorem 11.11 holds with 



A = 24, 495 7V-^'' and 5^1-5^7776 76 



1.057 



4.4 An application to coverage processes 

We consider the following coverage process, and associated coupling, from [T3]. Given a collection U = 
{C/i, Vi^ . . . , [/„} of independent, uniformly distributed points in the d dimensional torus of volume n, that 
is, the cube C„ — [0, n^l'^Y' C M'* with periodic boundary conditions, let V denote the total volume of the 
union of the n balls of fixed radius p centered at these n points, and 5* the number of balls isolated at 
distance p, that is, those points for which none of the other n — 1 points lie within distance p. The random 
variables V and S are of fundamental interest in stochastic geometry, see [T^ and [23]. If n — + 00 and p 
remains fixed, both V and S satisfy a central limit theorem [T71IH1I1B]. The distance of properly 
standardized, to the normal is studied in [9] using Stein's method. The quality of the normal approximation 
to the distributions of both V and S*, in the Kolmogorov metric, is studied in [T3j using Stein's method via 
size bias couplings. 

In more detail, for x e C„ and r > let B^{x) denote the ball of radius r centered at x, and Bi ^. = 
B{Ui^ r). The covered volume V and number of isolated balls S are given, respectively, by 

n n 

F = Volume(|jB,,p) and 5 = ^ 1{(W„ n B,,p = {(7,}}. (27) 

1=1 i=l 

We will derive concentration of measure inequalities for V and S with the help of the bounded size biased 
coupHngs in [H] , 

Assume d > 1 and n > 4. Denote the mean and variance of V by pv and ay, respectively, and likewise 
for S, leaving their dependence on n and p implicit. Let tt^ — ir'^/'^ /T{1 + d/2), the volume of the unit sphere 
in ffi.'*, and for fixed p let = ■Kdp'^- For < r < 2 let ujd{r) denote the volume of the union of two unit balls 
with centers r units apart. We have LOi{r) = 2 + r, and 

iOd{r)^Tid + ^d-i I (1 - (i/2)2)('^-i)/2dt, ford>2. 
Jo 

From [14] , the means of V and S are given by 

py = n (1 - (1 - (/)/«)") and ps = n{l - 0/n)"~\ (28) 

and their variances by 



S2p(0) 



P^M\y\lp) 



1- ^ ""vi^'/^^ ^ dy + n[n-2U){l-'^^ - 77^(1 - ,/,/n)^", (29) 



and 

al = n(l-0/n)"-i(l-(l-0/n)"-i) 

Hn-l) I ,,_ /a^.(M/p) y'^^^ 




+"("-!) - 1-- • (30) 
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It is shown in [14j , by using a coupling similar to the one briefly described for the urn allocation problem 
in Section [43l that one can construct with the V size bias distribution which satisfies — V\ < (p. 
Hence ^ of Theorem [TTT] holds for V with 

A "^/^^ An ^ 
Av = — 5— and By 



2av' 

where fiv and ay are given in (|28p and (|29|) . respectively. Similarly, with Y ^ n ~ S the number of non- 
isolated balls, it is shown that V with Y size bias distribution can be constructed so that — y | < + 1, 
where Kd denotes the maximum number of open unit balls in d dimensions that can be packed so they all 
intersect an open unit ball in the origin, but are disjoint from each other. Hence ^ of Theorem 1 1 . 1 1 holds 
for Y with 

(J% 2(75 

To see how the Ay, Ay and By, By behave as n ^ 00, let 

Jr.dip) = d-Kd / exp{-p'^ujd{t))t'^^'^dt, 
Jo 

and define 

gv{p) = / J2,d(p) - (2''0 + 0')e"'* and 

9s{p) = e-^-{l + {2''-2)cl, + cb^)e-^^ + p''{J2Ap)-JiAp))- 
Then, again from |14j . 

lim n^^ fly — lini (1 — n^^ ps) = 1 ^ e^'*, 

n — *oo n — >oo 

lim n"^ay — gv{p) > 0, and 

n — yoo 

lim = gs{p) > 0. 

Hence, By and By tend to zero at rate n~^/^, and 

hm Ay='^^'-\'\ and lun Ay ^ ^-^^±^1^ . 

n-+oo gv[p) gs[p) 



4.5 The lightbulb problem 

The following stochastic process, known informally as the 'lightbulb process', arises in a pharmaceutical study 
of dermal patches, see [29]. Changing dermal receptors to lightbulbs allows for a more colorful description. 
Consider n lightbulbs, each operated by a switch. At day zero, none of the bulbs are on. At day r for 
r = 1, . . . ,n, the position of r of the n switches are selected uniformly to be changed, independent of the 
past. One is interested in studying the distribution of the number of lightbulbs which are switched on at the 
terminal time n. The process just described is Markovian, and is studied in some detail in [33]. In [TB] the 
authors use Stein's method to derive a bound to the normal via a monotone, bounded size bias coupling. 
Borrowing this coupling here allows for the application of Theorem 11.11 to obtain concentration of measure 
inequalities for the lightbulb problem. We begin with a more detailed description of the process. 
For r = 1, . . . , n, let {Xrk, k — 1, . . . , n} have distribution 

P{Xri = ei, . . . , Xrn = e„) = f " j for all Cfc £ {0, 1} with Y.k=i ^fe = r, 
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and let these collections of variables be independent over r. These 'switch variables' Xrk indicate whether 
or not on day r bulb k had its status changed. With 




mod 2 



therefore indicating the status of bulb k at time n, the number of bulbs switched on at the terminal time is 

n 



k=l 



From [29^ , the mean /i and variance of Y are given by 



I / -I— r / z t 

1-n 



2i 



and 



1 - 



n 



1 - 



4i - 1) 

n n{n — 1) 



n 
T 



n 1 



4i 
n 



- 1) 
n(ji — 1) 



n > 



(31) 



(32) 



Note that when n is even ii = n/2 exactly, as the product in (|3ip is zero, containing the term i — n/2. 
By results in [29J, in the odd case — (n/2)(l + 0(e^")), and in both the even and odd cases — 
(n/4)(l + 0(e-")). 

The following construction, given in jl6J for the case where n is even, couples 1" to a variable having 
the Y size bias distribution such that 



(33) 



that is, the coupling is monotone, with difference bounded by 2. For every i S {1, . . . ,n} construct the 
collection of variables Y* from Y as follows. If y = 1, that is, if bulb i is on, let Y' — Y. Otherwise, with 



= U{j : y„/2,, = 1 - F„/2,»}, let Y' = {Y^ : r, A; = 1, 



, n 



} where 



rk 



Y„ 
Y„ 
Y,, 



Yrk r ^ n/2 

/2,k r ^n/2,k^ 

/2,.p r^n/2,k = i 

/2,t r = n/2,/c = J% 



and let Y' = J^l^i Y^ where 



mod 2. 



Then, with I uniformly chosen from {1, . . . ,rt} and independent of all other variables, it is shown in |16) 
that the mixture y = Y^ has the Y size biased distribution, essentially due to the fact that 

C{Y') = C{Y\Yi = 1) for alH 1, . . . , n. 

It is not difficult to see that y satisfies ((33|) . liYj — 1 then — X, and so in this case Y^ — Y. Otherwise 
Yi — 0, and for the given / the collection Y-^ is constructed from Y by interchanging the stage n/2, unequal, 
switch variables Yn/2,1 and Yn/2j' ■ If Yji = 1 then after the interchange Yj — 1 and Yji = 0, in which case 
Y' = Y. If Yji = then after the interchange Y/ = 1 and Yj, = 1, yielding = y + 2. We conclude that 
for the case n even C = 2 and ([2]) and ([3]) of Theorem 11.11 hold with 



A = n/a'^ and B = l/a 



(34) 
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where is given by ((5^ . 

For the couphng in the odd case, n = 2m + 1 say, due to the parity issue, |16] considers a random variable 
V close to Y constructed as follows. In all stages but stage m and m + 1 let the switch variables which will 
yield V be the same as those for Y. In stage m, however, with probability 1/2 one applies an additional 
switch variable, and in stage m + 1, with probability 1/2, one switch variable fewer. In this way the switch 
variables in these two stages have the same, symmetric distribution and are close to the switch variables for 
Y. In particular, as at most two switch variables are different in the configuration for V, we have |F — < 2. 
Helped by the symmetry, one may couple y to a variable V with the V size bias distribution as in the even 
case, obtaining V < V < V + 2. Hence 1^ and ^ of Theorem 11.11 hold for V as for the even case with 
values given in where /i = n/2 and — (n/4)(l + 0(e~"). Since | — y | < 2, by replacing t hy t + 2/a 
in the bounds for V one obtains bounds for the odd case Y. 



5 Applications: unbounded couplings 

One of the major drawbacks of Theorem 1 1.1 1 is the hypothesis that — Y\ be almost surely bounded with 
probability one. In this section we derive concentration of measure inequalities for two examples where 
— y is not bounded: the number of isolated vertices in the Erdos-Renyi random graph model, and 
the nonncgativc infinitely divisible distributions with certain associated moment generating functions which 
satisfy a boundedness condition. For the latter, compound Poisson distributions will be our main illustration. 



5.1 Number of isolated vertices in the Erdos Renyi random graph model 

Let Kn,p be the random graph on the vertices V = {1, 2, ... , n}, with the indicators Xyw of the presence of 
edges between two unequal vertices v and w being independent Bernoulli p € (0, 1) variables, and X^y ~ 
for all w e V. Recall that the degree of a vertex v G V is the number of edges incident on v, 

d{v) ^Y^^vw (35) 

The problem of approximating the distribution of the number of vertices v with degree d{v) = d for some 
fixed d was considered in [5], and a smooth function bound to the multivariate normal for a vector whose 
components count the number of vertices of some fixed degrees was given in |15j . 

Here we study the number of isolated vertices Y„^p of K„^p, that is, those vertices which have no incident 
edges, given by 

Y„,j,^J2lid{v)^0). 

In [19j . the mean fj, and variance of Yn^p are given as 

^„,, = „(l-p)"-i and <, = n(l-p)"-i(l + np(l-pr-2-(l-p)«-2), (36) 

where also Kolmogorov distance bounds to the normal were obtained, and asymptotic normality shown when 

n^p — > oo and np — \og{n) —f — oo. 

O'Connell [23] shows an asymptotic large deviation principle holds for Yn^p. Raic [28] obtained nonuniform 
large deviation bounds in some generality for random variables W with E{W) = and Var(VF) = 1 of the 
form, 

^i^^<e''m/^(l + Qit)m foralH>0, (37) 
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where denotes the distribution function of a standard normal variate and Q{t) is a quadratic in t. 
Although in general the expression for f3{t) is not simple, when W is Yn^p properly standardized and np c 
as n ^ oo, then ([57)1 holds for all n sufficiently large with 



/3W = ^expf^ + C3(e^^*/^-l)) 



for some constants Ci,C2,C3 and C4. For t of order n^/^, for instance, the function P{t) will be small as 
n —^ 00, allowing an approximation of the deviation probability P{W > t) by the normal, to within some 
factors. Theorem 15.11 below, by contrast, provides a non-asymptotic bound, that is, not relying on any 
limiting relations between n and p, with explicit constants, which hold for every n. Moreover, the bound is 
of order e~°' over some range of t, and of worst case order e"***, for the right tail by ([40]) . and e"'^* by ([39]) 
for the left tail, where a,b and c are explicit, with the bounds holding for all < G M. 
For notational ease, we keep the dependence on n and p implicit in the sequel. 

Theorem 5.1. Let K denote the random graph on n vertices where each edge is present with probability 
p £ (0, 1), independently of all other edges, and let Y denote the number of isolated vertices in K . Then for 
all t > 0, 

P (^—^ >t^ < inf cxp{~-et + H{e)) where HiO) = f sj^ds, (38) 
\ a J e>o 2a^ Jo 

with the mean /i and variance of Y given in i36]) . and 

7s=2e2Ml + ^^) +/3+1 where P = {1 ~ p)-"^ . 



1-p 



For the left tail, for all t > 0, 



Pl'-^<-t] < .,p|---^|. (3.,) 



Remark 5.1. Though the minimization in LS8\) is admittedly cumbersome, useful bounds may be obtained 
by restricting the minimization to 9 £ [0,9q\ for some 6q. In this case, as 7^ is an increasing function of s, 
we have 



H{e)<£^jeoO^ foree[0,9„]. 



The guadratic —9t + fi'ygg9'^ / {4a'^) in 9 is minimized at 9 = 2ta'^ / {^jg^) . When this value falls in [0, ^o] '^c 
obtain the first bound in ^0^ , while otherwise setting 9 = 9q yields the second. 



^22 



Y - II f exp(--^) for t G [0, 6ioM7eo/(2o-^)] 

P{- ^->t)<{ ^ L'OP/f^o/V ;j ^^^^ 

^ i eM-Oot + ^^) for t e {9ofi^eo/{2a^),oo). 



Though Theorem 15. Jl is not an asymptotic, as it gives bounds for any specific n and p, when np c as 
00 we have 



> l + ce-" - e-", /3+1-^e^ + l and 7^ -> 26^''+=*=° + e^^ + 1 as n ^ 00. 



Hence, the left tail bound I139\} . for example, in this asymptotic behaves as 



2 \ / +2 
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Proof. We first review the construction of Y", having the Y size bias distribution, as given in [TS]. Let K, a 
particular reahzation of K{n,p), be given, and let Y be the number of isolated vertices for this realization. 
To size bias Y, choose one of the n vertices of K uniformly. If the chosen vertex, say V, is already isolated, 
we do nothing and set K'^ = K. Otherwise obtain K" by deleting all the edges connected to K. Then Y^ , 
the number of isolated vertices of if, has the Y size biased distribution. 

To derive the needed properties of this coupling, let N(v) be the set of neighbors of w G V, and T the 
collection of isolated vertices of that is, with the degree of w, given in ((35)) . 

N{v) = {w : X^^ = 1} and T = {v: d{v) = 0}. 

Note that Y — |T|. Since all edges incident to the chosen V are removed in order to form if", any neighbor 
of V which had degree one thus becomes isolated, and V also becomes isolated if it was not so earlier. As 
all others vertices are otherwise unaffected, as far as their being isolated or not, we have 

Y" -Y = di{V) + lid{V)^0) where di(y) ^ ^ l{d{w) ^ 1), (41) 

weN{v) 

so in particular the coupling is monotone. Since <ii(T^) < d{V), ([1T|) yields 

Y' -Y <d{V) + l. (42) 
By ([5]), using that the coupling is monotone, for 9 > we have 

i;(e«^=_e«^) < ^i? ((F" - y)(e^^ ^ e«^) 

= (eMOYW - Y) {cxp{e{Y' - Y)) + 1)) 

- ^E{cxp{eY)E{iY' -Y){cxp{9{Y' -Y)) + 1)\T)}. (43) 

Now using that Y'^ —Y when F e T, and (|^^ . we have 

EiiY" - Y){cxp{d{Y-' - Y)) + 1)\T) 

< E{{d{V) + l)(exp(0(d(l/) + 1)) + l)l{V ^ T)\T) 

< e''^((rf(y)e^'^(^)+e^'^(^)+rf(y)) l(l/^r)|T) +1. (44) 
Note that since V is chosen independently of K, 

C{d{V)l(y ^ T)\T) = P{V ^ T)C{Bin{n - 1 - Y,p)\ Bin{n -l-Y,p)>0)+ P{V G T)5a, (45) 
where (5o is point mass at zero. By (|45p . and that the mass function of the conditioned binomial there is 



(n-l-Y\ p (1-p) for 1 < A' < T7 - 1 - F 

P{d{V)^k\T,V <^T)^ I V k ) i-(i-p)r.-i-Y lor isi^Sn i r 

I otherwise, 

it can be easily verified that the conditional moment generating function of d{V) and its first derivative are 
bounded by 



^'-^y and 



E{d{V)e"'(''h{V^T)\T) < ^ 



- (l-p)«-i-^ 
(n - 1 - y)(pe^ + 1 - pY'-^-^pe^ 
l-(l-p) 
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By the mean value theorem apphed to the function f{x)=x^ ^ ^, for some ^ € (1 — p, 1) we have 

1 - (1 - p)"-^-^ = /(I) - /(I - p) - (n - 1 - > (n - 1 - - pT. 

Hence, recalhng 6* > 0, 

^ (n - 1 - y ) (pe'^ + 1 - pYpe'^ 
~ (n - 1 - -p)" 

= where ae — e 1 H ) . (46) 

SuTiilarly applymg the mean value theorem to fix) = (.t + 1 — p)"^^^^, for some ^ G (0,pe^) we have 



< 



(n - 1 - r)(pe'' + (1 - p))"-2--i'pe« 



1 - (1 -p)"-!-'*' 
< as, (47) 



as in (I46|) . 

Next, to handle the second to last term in ((44|) consider 



i.(d(V^)l(V^ ^ r)|T) < ^ < : ^ where /^^(l-p)-. (48) 



Applying inequalities (ge]) , (gT]) and (gH]) to (gll) yields 

E{{Y' -Y){ex:p{e{Y' -Y)) + l)\r) < je where 7e = 2e^ae + /3 + 1. (49) 
Hence we obtain, using (|43)l . 



£;(e«>'= _ e"^) < ^£;(e^^) for all > 0. 
Letting m{d) — E{e^^) thus yields 

m'iO) = E{Ye^^) = nE{e^^^) < (^1 + m{e). (50) 

Setting 

M{9) = i;(exp(6'(r - ^)/cr)) = e^^''/'"TO(6l/cr), 
differentiating and using (|50ll . we obtain 

M'(0) = -e-^''/'^m' (61/(7) - ^e-^^/'^m(6'/cr) 



(7 CT 

< 



^e-«^/^(f + ^)m(^?/a) - ^e-«^/'^m(0/a) 

(7 2(7 CT 

= ^Mie). (5f) 



Since M(0) = 1, (HI]) yields upon integration of M'{s)/M{s) over [0,6*], 

log(Af(0)) < H{0) so that M{e) < exp(iJ(0)) where ^(6*) = / S7^ds. 
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Hence for i > 0, 

^ > t) < F(exp(^^^-^) > e*^*) < e-'^'M{0) < e^-dt + H{9)). 
a a 

As the inequality holds for all > 0, it holds for the achieving the minimal value, proving 
For the left tail bound let 6* < 0. Since > F and 6* < 0, using ^ and (gH) we obtain 

< leiEie^^^Y" -Y)) 

= \0\E{e'^^E{Y' ~Y\T)) 

< \e\E{e'''E{{d{V) + mV^T))\T)). 

£;(e^^_e«^')<(/?+l)|0|£;(e^^), 
m'{9) = nE{e^^^) > (1 + (/3 + 1)6*) m{9). 



Applying pS)) we obtain 



and therefore 



Hence for 6* < 0, 

a a 

> /fe-eM/-((i + + \)Bla)m{Bla)) - ^e''^^^" m{B I a) 

Dividing by M[ff) and integrating over [0, 0] yields 

/i(/3+l)02 
2(7 

The inequality in ((52)) implies that for alH > and 6* < 0, 

p — - < -t) < cxp{et + / . 

Taking 6* = -ta^/{fi{l3 + 1)) we obtain §9^). □ □ 

5.2 Infinitely divisible and compound Poisson distributions 



log(Mie)) < (52) 



The examples in this section generalize the application of Theorem 11.11 from the case where Y is Poisson 
with parameter A > 0. In this case, Y admits a bounded coupling to a variable with its size bias distribution 
due to the characterization 

E[Yf{Y)] = \E[f{Y + 1)] if and only if F - Poisson(A), (53) 

which forms the basis of the Chen-Stein Poisson approximation method, see [8, 4 . In particular we may take 
= y + 1, and, therefore C = 1. As the mean and variance for the Poisson are equal, and the coupling is 
monotone, applying Theorem ll.il we obtain the following result. 

Proposition 5.1. IfY^ Poisson{X), then for all t > Q, 

P — 1=- <-t] < cxp -— and P[ — — >t] < exp 



The Poisson distribution is infinitely divisible, and also a special case of the compound Poisson distribu- 
tions. We generalize Proposition 1 5 . 1 1 in these directions. 
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5.2.1 Infinitely divisible distributions 

When Y is Poisson then by ([55)1 = Y + 1 and we may write 

Y" ^Y + X (54) 

with X and Y independent. Theorem 5.3 of 33J shows that if Y is nonnegative with finite mean then (|54p 
holds if and only if Y is infinitely divisible. Hence, in this case, a coupling of Y to Y'^ may be achieved by 
generating the independent variable X and adding it to Y. Since is always stochastically larger than Y 
we must have X > 0, and therefore this coupling is monotone. In addition Y'^ — Y = X so the coupling 
is bounded if and only if X is bounded. When X is unbounded. Theorem 15.21 provides concentration of 
measure inequalities for Y under appropriate growth conditions on two generating functions in Y and X. 
We assume without further mention that Y is nontrivial, and note that therefore the means of both Y and 
X are positive. 

Theorem 5.2. Let Y have a nonnegative infinitely divisible distribution and suppose that there exists 7 > 
so that E(e^'^) < 00. Let X have the distribution such that |5.^[ ) holds when Y and X are independent, and 
assume E{Xe'i^) = C < 00. Letting ^ = E{Y), = Var(Y),iy = E{X) and K ^ {C + v)/2, the following 
concentration of measure inequalities hold for all t > 0, 

exp(-7i+-^) /or < e [7 A>/a2, 00), \ ^ J \ 2v^m 



Proof. The proof is similar to that of Theorem 15.11 Since Y^ = Y + X with Y and X independent and 
X > 0, using (O with 9 G (0, 7) we have, 

= ^-E + l)e«^) =^ ^-E {X{e''' + 1)) £;(e^^) 

< ^-{E{Xe-^^) + E{X))E{e<^^) 

^ KOmie) w\ieTe K ^ [C + v)/2 SinAm{e) = E{e^^). 
Now adding m{9) to both sides yields 

Eie'^^') < il + K0)m{e), 

and therefore 

m'i0) = E{Ye'^^) = ^iE{e''^') < + Ke)m{e). (55) 
Again, with M (0) the moment generating function of (Y — /i) /cr, 

M{0) = Ee'^^^-f'^/" = e-«'^/'^m(6i/a), 

by l(55|) we have, 

M'(^) = -(^/a)e-''^/"77i(0/a) + e-«''/"m'(0/a)/(T 

-(Ai/a)e^''^/"m(0/CT) + (/i/(7)e-«'^> (^1 + m(0/(7) 
{^/cr^)K0M{0). (56) 



< 
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Integrating, and using the fact that M(0) = 1 yields 



M{e)<e^p(^\ for 0e (0,7). 



Hence for a fixed t > 0, for all 9 £ (0, 7), 

P ( ^— — ^ >t)< e-^'Mie) < exp I --et 



The infimum of the quadratic in the exponent is attained aX 6 = ta^/K^. When this value lies in (0,7) we 
obtain the first, right tail bound, for t in the bounded interval, while setting 6 — ^ yields the second. 
Moving on to the left tail bound, using ([5]) for 6* < yields 

< J-E{{Y' -Y)[e'>^ + e'""')) < -eE{Xe'^) = -eE{X)E{e'^). 

Rearranging we obtain 

m{e) = ^lE{e^^') > + 6v)m{e). 
Following calculations similar to (|56p one obtains 

M'{e) > {n/a^)iy9M{e) for aU 6* < 0, 
which upon integration over [9, 0] yields 

M{e) < exp (^^^^ for aU 9 <0. 

Hence for any fixed t > 0, for all < 0, 

P (^^-^ < -t) < e''Mi9) < exp (^9t + . (57) 

Substituting 9 — —ta'^/{vii) in ([F7|) yields the lower tail bound, thus completing the proof. □ □ 



Though Theorem 15.21 applies in principle to all nonnegative infinitely divisible distributions with gener- 
ating functions for Y and X that satisfy the given growth conditions, we now specialize to the subclass of 
compound Poisson distributions, over which it is always possible to determine the independent increment 
X. Not too much is sacrificed in narrowing the focus to this case, since a nonnegative infinitely divisible 
random variable Y has a compound Poisson distribution if and only if P{Y = 0) > 0. 



5.2.2 Compound Poisson distribution 

One important subfamily of the infinitely divisible distributions are the compound Poisson distributions, 
that is, those distributions that are given by 

N 

Y = Zi, where N ~ Poisson(A), and {Zi}fli are independent and distributed as Z. (58) 

1=1 

Compound Poisson distributions are popular in several applications, such as insurance mathematics, seis- 
mological data modelling, and reliability theory; the reader is referred to [3] for a detailed review. 

Although Z is not in general required to be nonnegative, in order to be able to size bias Y we restrict 
ourselves to this situation. It is straightforward to verify that when the moment generating function mz{9) = 
Ee^^ of Z is finite, then the moment generating function ni{9) of Y is given by 

m(6') = exp(-A(l- 7712(6*))). 

In particular 171(9) is finite whenever mz{9) is finite. As Y in (|58p is infinitely divisible the equality ([M]) 
holds for some X] the following lemma determines the distribution of X in this particular case. 
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Lemma 5.1. Let Y have the compound Poisson distribution as in i58\) where Z is nonnegative and has 
finite, positive mean. Then 

= F + 

has the Y size biased distribution, where has the Z size bias distribution and is independent of N and 

{z^}r=l■ 

Proof. Let (f>v{'u) = Ee^^^ for any random variable V. If V is nonnegative and has finite positive mean, 
using /(y) = e*"^ in ([T]) results in 

c^v^u) = ^ {EVEe--') = -^EVe-y = -^Mu)- (59) 
It is easy to check that the characteristic function of the compound Poisson Y in ((58)) is given by 

= exp(-A(l-0z(u))), (60) 
and letting EZ = d, that EY = M. Now applying and resuhs in 

□ □ 

To illustrate Lemma [5 .11 consider the Cramer-Lundberg model |10j from insurance mathematics. Suppose 
an insurance company starts with an initial capital uq, and premium is collected at the constant rate a. 
Claims arrive according to a homogenous Poisson process {Nr}T>o with rate A, and the claim sizes are 
independent with common distribution Z . The aggregate claims Y^ made by time r > is therefore given 
by (|58p with N and A replaced by Nt and A,-, respectively. 

Distributions for Z which are of interest for applications include the Gamma, WeibuU, and Pareto, among 
others. For concreteness, if Z ~ Gamma(a,/3) then Z*' ^ Gamma(Q; + 1, /3), and the mean of the increment 
Z* , and the mean fi^ and variance of Yt , are given by 

ly — {a + l)/3, /It — XraP and cr^ — Ar/3^a. 

The conditions of Theorem 15. 21 are satisfied with any 7 S (0, 1//3) since E{e^^) < 00 and E{Z^e^^ ) < 00 
for all 6* < Taking 7 = l/(M/3) for M > 1 for example, yields 

C = E{Z^e'^^') = {a + m^^^r+\ 



For instance, the lower tail bound of Theorem 15. 21 now yields a bound on the probability that the aggregate 
claims by time r will be 'small', of 



2(a+ 1) 



It should be noted that in some applications one may be interested in Z which are heavy tailed, and hence 
do not satisfy the conditions in Theorem 15.21 



References 

[1] Baldi, P., RiNOTT, Y. and Stein, C. (1989). A normal approximations for the number of local 
maxima of a random function on a graph. Probability, Statistics and Mathematics, Papers in Honor of 
Samuel Karlin, T. W. Anderson, K.B. Athrcya and D. L. Iglehart eds., Academic Press, 59-81. 



24 



[2] Barbour, A.D. and Chen, L,H.Y(2005). An Introduction to Stein's Method, Chen,L.H.Y and Bar- 
bour,A.D. eds, Lecture Notes Series No. 4, Institute for Mathematical Sciences, National University of 
Singapore, Singapore University Press and World Scientific 2005, 1-59. 

[3] Barbour, A.D. and Chryssaphinou, 0.(2001). Compound Poisson approximation: A user's guide, 
Ann. Appl. Probab., 11, 964-1002. 

[4] Barbour, A.D., Holst, L., and Janson, S. (1992). Poisson Approximation, Oxford University Press. 

[5] Barbour, A.D., Karonski, M. and Rucinski,A.(1989). A central limit theorem for decomposable 
random variables with applications to random graphs, J. Combinatorial Theory B, 47, 125-145. 

[6] Bayer, D. and Diaconis, P. (1992). Trailing the Dovetail Shuflle to its Lair. Ann. of Appl. Probab. 2, 
294-313. 

[7] Chatterjee, S.(2007). Stein's method for concentration inequalities, Probab. Theory Related Fields, 
138, 305-321. 

[8] Chen, L.H.Y (1975). Poisson approximation for dependent trials, Ann. Probab., 3, 534-545. 

[9] Chatterjee, S.(2008) A new method of normal approximation. Ann. Probab., 4, 1584-1610. 

[10] Embrechts, p. and Kluppelberg, C.(1993). Some aspects of insurance mathematics, Th. Probab. 
Appl, 38, 262-295. 

[11] Englund, G.(1981). a remainder term estimate for the normal approximation in classical occupancy, 
Ann. Probab., 9, 684-692. 

[12] Feller, W.(1966). An Introduction to Probability and its AppUcations, volume II. Wiley. 

[13] Goldstein, L.(2005). Berry Esseen bounds for combinatorial central limit theorems and pattern oc- 
currences, using zero and size biasing. Journal of Applied Probability, 42, 661-683. 

[14] Goldstein, L. and Penrose, M.(2008). Normal approximation for coverage models over binomial 
point processes, preprint. 

[15] Goldstein, L. and Rinott, Y.(1996). Multivariate normal approximations by Stein's method and 
size bias couplings. Journal of Applied Probability, 33,1-17. 

[16] Goldstein, L. and Zhang, H. (2009). A Berry Esseen theorem for the lightbulb problem. Preprint 

[17] Hall, P. (1988). Introduction to the theory of coverage processes, John Wiley, New York. 

[18] KoLCHiN, V.F., Sevast'yanov, B.A. and Chistyakov, V. P. (1978). Random Allocations, Winston, 
Washington B.C. 

[19] KORDECKI, W.(1990). Normal approximation and isolated vertices in random graphs, Random Graphs 
'87, Karonski, M., Jaworski, J. and Rucinski, A. eds., John Wiley & Sons Ltd., 1990, 131-139. 

[20] Ledoux, M.(2001). The concentration of measure phenomenon, Amer. Math. Soc, Providence, RI. 

[21] MiDZUNO, H. (1951). On the sampling system with probability proportionate to sum of sizes. Annals 
of the Institute of Statistical Mathematics, 2, 99-108. 

[22] MoRAN, P. A. P. (1973). The random volume of interpenetrating spheres in space, J. Appl. Probab., 10, 
483-490. 

[23] O'Connell, N. (1998). Some large deviation results for sparse random graphs, Probab. Th. Rel. Fields, 
110, 277-285. 



25 



[24] Penrose, M.(2003). Random geometric graphs, Oxford University Press, Oxford. 

[25] Penrose, M.(2009). Normal approximation for isolated balls in an urn allocation model, preprint. 

[26] Penrose, M.D. and Yukich, J. E. (2001). Central limit theorems for some graphs in computational 
geometry, Ann. Appl. Probab., 11, 1005-1041. 

[27] QuiNE, M.P. and Robinson, J. (1982). A Berry Esseen bound for an occupancy problem, Ann. Probab, 
10, 663-671. 

[28] Raic, M.(2007). CLT related large deviation bounds based on Stein's method. Adv. Appl. Prob., 39, 
731-752. 

[29] Rao, C.R., Rao, B.M., and Zhang, H.(2007). One Bulb? Two Bulbs? How Many Bulbs Light Up? 
A Discrete Probability Problem Involving Dermal Patches, Sankhya, 69, pp. 137-161. 

[30] Reinert, G. and Rollin, A. (2008). Multivariate normal approximation with Stein's method of ex- 
changeable pairs under a general linearity condition, Ann. Probab., to appear. 

[31] Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum 
of dependent random variables, Proc. Sixth Berkeley Symp. Math. Statist. Probab. 2, 583-602, Univ. 
California Press, Berkeley. 

[32] Stein, C. (1986). Approximate Computation of Expectations. Institute of Mathematical Statistics, 
Hayward, CA. 

[33] Steutel, W.F.(1973). Some recent results in infinite divisibility. Stoch. Proc. Appl, 1, 125-143. 

[34] Zhou, H. and Lange, K. (2009). Composition Markov chains of multinomial type. Advances in Applied 
Probability 



26 



